Architecture and Performance of the Hitachi SR2201 Massively Parallel Processor System
نویسندگان
چکیده
RISC-based Massively Parallel Processors (MPPs) often show low efficiency in real-world applications because of cache miss penalty, insufficient throughput of the memory system, and poor inter-processor communication performance. Hitachi's SR2201, an MPP scalable up to 2048 processors and 600 GFLOPS peak performance, overcomes these problems by introducing three novel features. First, its processor, the 150 MHz HARP-1E, solves the cache miss penalty by "pseudo vector processing" (PVP). In PVP, data is loaded by prefetching to a special register bank, bypassing the cache. Second, a multi-bank memory architecture that operates like a pipeline eliminates the memory system bottleneck. Third, the inter-processor communication achieves high performance on the three-dimensional crossbar network, using a "remote DMA transfer" protocol and a hardware-based cache coherency. As the result of these improvements, the SR2201 achieved 220.4 GFLOPS with 1024 processors in the LINPACK benchmark, which is almost 72% of the peak performance.
منابع مشابه
An efficient implementation of parallel eigenvalue computation for massively parallel processing
This article describes an e cient implementation and evaluation of a parallel eigensolver for computing all eigenvalues of dense symmetric matrices. Our eigensolver uses a Householder tridiagonalization method, which has higher parallelism and performance than conventional methods when problem size is relatively small, e.g. the order of 10,000. This is very important for relevant practical appl...
متن کاملParallel Java for the Hitachi SR 2201
In October of 1997, a year-long collaborative project was started between Hitachi Europe Limited (HEL) and the Edinburgh Parallel Computing Centre (EPCC) at the University of Edinburgh. This project had the goal of producing an environment whereby Java programs may be executed on the Hitachi SR2201 distributed memory multi-processor machine. The two key deliverables from this work are a port of...
متن کاملDeadlock-Free Fault-tolerant Routing in the Multi-dimensional Crossbar Network and Its Implementation for the Hitachi SR2201
We have developed a hardware detour path selection facility for the Hitachi SR2201 parallel computer, which uses a multi-dimensional crossbar as an inter-processor network to ensure operating efficiency and high reliability when a part of the network is faulty. When this hardware facility is used, packets are transmitted to their destination along alternative paths to avoid the fault. However, ...
متن کاملA Methodology for Automatically Tuned Parallel Tridiagonalization on Distributed Memory Vector-parallel Machines
In this paper, we describe an auto-tuning methodology for the parallel tridiagonalization to attain high performance. By searching the optimal set of three parameters for the performance, a highly eecient routine can be obtained automatically. Evaluation of the methodology on the distributed memory parallel machines, the HITACHI SR2201 and HITACHI SR8000, has been provided. The experimental res...
متن کاملEffective Simulation for the Giga-scale Massively Parallel Supercomputer SR2201
A high performance parallel network simulation environment was developed in the SR2201 project. The SR2201 is one of the highest performance massively parallel supercomputers in the world. The enhanced simulation algorithm achieved a 2.4 times increase in simulation speed compared with conventional simulation methodology. A 98% detection rate for all design errors before physical design contrib...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997